{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "Tce3stUlHN0L",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2024 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "tuOe1ymfHZPu",
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "introduction",
      "metadata": {
        "id": "introduction"
      },
      "source": [
        "# Building a RAG using Gemma with Elasticsearch, Ollama and Langchain\n",
        "\n",
        "This tutorial will guide you through building a **Retrieval-Augmented Generation (RAG)** application using the **Gemma 2 9B** model, **LangChain**, **Ollama**, and **Elasticsearch**. You'll go through each step in detail, ensuring that even if you're new to RAGs or Large Language Models (LLMs), you'll be able to follow along and build your own local AI application."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "introduction-gemma",
      "metadata": {
        "id": "introduction-gemma"
      },
      "source": [
        "## Introduction\n",
        "\n",
        "**Retrieval-Augmented Generation (RAG)** is a technique that combines large language models (LLMs) with external knowledge sources to generate more accurate and contextually relevant responses. It involves two main components:\n",
        "\n",
        "* **Retriever**: Based on the user's query, the retriever fetches relevant documents from a dataset to provide additional context to the LLM.\n",
        "\n",
        "* **Generator**: The LLM uses the retrieved context along with the user's query to generate accurate and coherent responses.\n",
        "\n",
        "By combining retrieval with generation, RAG systems can produce responses that are both informed and coherent, making them ideal for tasks like question answering over custom datasets.\n",
        "\n",
        "[**Gemma**](https://ai.google.dev/gemma) is a family of lightweight, state-of-the-art open language models from Google. Built from the same research and technology used to create the Gemini models, Gemma models are text-to-text, decoder-only large language models (LLMs) available in English, with open weights, pre-trained variants, and instruction-tuned variants.\n",
        "\n",
        "The **Gemma 2 9B IT Q6 K** model is a quantized instruction-tuned version of the Gemma model, optimized for performance while reducing computational load. This makes it possible to deploy the model in environments with limited resources, such as a laptop, desktop, or your own cloud infrastructure. It democratizes access to state-of-the-art AI models and fosters innovation for everyone.\n",
        "\n",
        "[**LangChain**](https://python.langchain.com/) is a framework for developing applications powered by language models. It provides a suite of tools and integrations that simplify building complex AI applications, such as chatbots, question-answering systems, and more. LangChain allows you to chain together various components like prompts, LLMs, and retrievers to create sophisticated pipelines.\n",
        "\n",
        "[**Ollama**](https://ollama.ai/) is a tool that simplifies running language models locally. It allows you to manage and serve multiple models efficiently, making it easier to deploy and test AI models on your machine. With Ollama, you can switch between different models and versions seamlessly, providing flexibility in development and experimentation. You can browse the available Gemma 2 models at the [Ollama Gemma 2 Model Catalog](https://ollama.com/library/gemma2).\n",
        "\n",
        "[**Elasticsearch**](https://www.elastic.co/elasticsearch/) is a powerful open-source search and analytics engine. It allows you to store, search, and analyze large volumes of data quickly and in near real-time. In this tutorial, Elasticsearch serves as the data source and vector store for our RAG application, enabling efficient retrieval of relevant documents based on user queries.\n",
        "\n",
        "By combining these tools, you can build a local RAG application that leverages the strengths of each component to create a powerful AI application capable of handling tasks like question answering over custom datasets—all running locally on a modest GPU like the T4.\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Using_with_Elasticsearch_and_LangChain.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "setup",
      "metadata": {
        "id": "setup"
      },
      "source": [
        "## Setup\n",
        "\n",
        "Before you begin, make sure you have a Google Colab account."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "setup-colab-runtime",
      "metadata": {
        "id": "setup-colab-runtime"
      },
      "source": [
        "### Select the Colab Runtime\n",
        "\n",
        "First, you'll need to set up your Google Colab environment:\n",
        "\n",
        "1. **Open Google Colab** and create a new notebook.\n",
        "2. In the upper-right corner of the Colab window, click on the **▾ (Additional connection options)** button.\n",
        "3. Select **Change runtime type**.\n",
        "4. Under **Hardware accelerator**, choose **GPU**.\n",
        "5. Ensure that the **GPU type** is set to **T4**.\n",
        "\n",
        "This will provide sufficient resources to run the Gemma 2 9B model.\n",
        "\n",
        "### Gemma Setup\n",
        "\n",
        "Before diving into the tutorial, let's set up Gemma:\n",
        "\n",
        "1. **Create a Hugging Face Account**: If you don't have one, you can sign up for a free account [here](https://huggingface.co/join).\n",
        "2. **Access the Gemma Model**: Visit the [Gemma model page](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) and accept the usage conditions.\n",
        "3. **Generate a Hugging Face Token**: Go to your Hugging Face [settings page](https://huggingface.co/settings/tokens) and generate a new access token (preferably with `write` permissions).\n",
        "\n",
        "**Once you've completed these steps, you're ready to move on to the next section where you'll set up environment variables in your Colab environment.**"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "configure-credentials",
      "metadata": {
        "id": "configure-credentials"
      },
      "source": [
        "### Configure Your Credentials\n",
        "\n",
        "\n",
        "Next, we'll securely store your Hugging Face token using the Colab Secrets manager:\n",
        "\n",
        "1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src=\"https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg\" alt=\"The Secrets tab is found on the left panel.\" width=50%>\n",
        "2. **Add Hugging Face Token**:\n",
        "   - Create a new secret named `HF_TOKEN`.\n",
        "   - Paste your Hugging Face token into the Value input box.\n",
        "   - Toggle the button to allow notebook access to the secret.\n",
        "\n",
        "Now, set the environment variables in your notebook:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "id": "set-env-vars",
      "metadata": {
        "id": "set-env-vars"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from google.colab import userdata\n",
        "\n",
        "# Set Hugging Face token\n",
        "os.environ[\"HF_TOKEN\"] = userdata.get(\"HF_TOKEN\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "OkKjRMosdF_a",
      "metadata": {
        "id": "OkKjRMosdF_a"
      },
      "source": [
        "This code retrieves your secrets and sets them as environment variables, which you will use later in the tutorial."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6B4b-MNldLzF",
      "metadata": {
        "id": "6B4b-MNldLzF"
      },
      "source": [
        "### Installing Dependencies\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "install-dependencies",
      "metadata": {
        "id": "install-dependencies"
      },
      "source": [
        "Next, you need to install all the required dependencies. Run the following cell to install them:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "id": "install-deps",
      "metadata": {
        "id": "install-deps"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.4/50.4 kB\u001b[0m \u001b[31m1.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m42.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m53.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m404.4/404.4 kB\u001b[0m \u001b[31m19.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m294.6/294.6 kB\u001b[0m \u001b[31m20.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.4/76.4 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.0/78.0 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m141.9/141.9 kB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.5/54.5 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m245.3/245.3 kB\u001b[0m \u001b[31m16.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.0/86.0 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Building wheel for sentence-transformers (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "langchain-huggingface 0.1.0 requires sentence-transformers>=2.6.0, but you have sentence-transformers 2.2.2 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m436.6/436.6 kB\u001b[0m \u001b[31m26.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "langchain-huggingface 0.1.0 requires sentence-transformers>=2.6.0, but you have sentence-transformers 2.2.2 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.2/51.2 kB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m524.6/524.6 kB\u001b[0m \u001b[31m37.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m64.4/64.4 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m498.8/498.8 kB\u001b[0m \u001b[31m40.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ],
      "source": [
        "!pip install -q langchain tiktoken\n",
        "!pip install -q langchainhub langchain-huggingface langchain-text-splitters\n",
        "!pip install -q sentence-transformers==2.2.2\n",
        "!pip install -q -U huggingface-hub\n",
        "!pip install -q -U elasticsearch==8.15.1 langchain-elasticsearch==0.3.0\n",
        "!pip install -q langchain-ollama"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "pFZgAcOWhGnW",
      "metadata": {
        "id": "pFZgAcOWhGnW"
      },
      "source": [
        "### Import Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "id": "8S7QO6QthJlm",
      "metadata": {
        "id": "8S7QO6QthJlm"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import time\n",
        "from google.colab import userdata\n",
        "from typing import Dict\n",
        "\n",
        "from elasticsearch import Elasticsearch\n",
        "\n",
        "from langchain_ollama.chat_models import ChatOllama\n",
        "from langchain_huggingface import HuggingFaceEmbeddings\n",
        "from langchain_elasticsearch import ElasticsearchStore, ElasticsearchRetriever\n",
        "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
        "from langchain import hub\n",
        "from langchain_core.output_parsers import BaseTransformOutputParser\n",
        "from langchain_core.runnables import RunnablePassthrough"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "gemma-section",
      "metadata": {
        "id": "gemma-section"
      },
      "source": [
        "## Gemma\n",
        "\n",
        "Gemma models are designed to be lightweight yet powerful, making them suitable for environments with limited resources. They support various text generation tasks and are available in instruction-tuned variants, which means they've been trained to follow instructions provided in prompts.\n",
        "\n",
        "#### Prompt Formatting\n",
        "\n",
        "Instruction-tuned models use specific control tokens to format prompts:\n",
        "\n",
        "- **`user`**: Indicates a user turn.\n",
        "- **`model`**: Indicates a model turn.\n",
        "- **`<start_of_turn>`**: Marks the beginning of a dialogue turn.\n",
        "- **`<end_of_turn>`**: Marks the end of a dialogue turn.\n",
        "\n",
        "This formatting helps the model understand and generate conversational responses. Refer to the [official documentation](https://ai.google.dev/gemma/docs/formatting) for more details."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "install-ollama",
      "metadata": {
        "id": "install-ollama"
      },
      "source": [
        "### Installing and Running Ollama\n",
        "\n",
        "You will use Ollama to run the Gemma model locally.\n",
        "\n",
        "First, install Ollama by running:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "id": "install-ollama-code",
      "metadata": {
        "id": "install-ollama-code"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            ">>> Installing ollama to /usr/local\n",
            ">>> Downloading Linux amd64 bundle\n",
            "############################################################################################# 100.0%\n",
            ">>> Creating ollama user...\n",
            ">>> Adding ollama user to video group...\n",
            ">>> Adding current user to ollama group...\n",
            ">>> Creating ollama systemd service...\n",
            "WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.\n",
            ">>> The Ollama API is now available at 127.0.0.1:11434.\n",
            ">>> Install complete. Run \"ollama\" from the command line.\n"
          ]
        }
      ],
      "source": [
        "!curl -fsSL https://ollama.com/install.sh | sh"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "5Qky7E72dacP",
      "metadata": {
        "id": "5Qky7E72dacP"
      },
      "source": [
        "Then, start the Ollama server in the background.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "id": "install-langchain-ollama-code",
      "metadata": {
        "id": "install-langchain-ollama-code"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "nohup: redirecting stderr to stdout\n"
          ]
        }
      ],
      "source": [
        "!nohup ollama serve > ollama.log &"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ollama-gemma",
      "metadata": {
        "id": "ollama-gemma"
      },
      "source": [
        "Ollama provides a library of pre-configured models, including Gemma 2 models. You can browse the available Gemma 2 models at the [Ollama Gemma 2 Model Catalog](https://ollama.com/library/gemma2). This allows you to switch between different Gemma 2 models easily.\n",
        "\n",
        "In this notebook, you'll use the [gemma2:9b-instruct-q6_K](https://ollama.com/library/gemma2:9b-instruct-q6_K) model.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "id": "test-ollama",
      "metadata": {
        "id": "test-ollama"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "The capital of France is **Paris**. 🇫🇷  \n",
            "\n",
            "\n"
          ]
        }
      ],
      "source": [
        "!ollama run gemma2:9b-instruct-q6_K \"What is the capital of France?\" 2> ollama.log"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3jychWGrdp19",
      "metadata": {
        "id": "3jychWGrdp19"
      },
      "source": [
        "You should see the model's response in the output."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "integrate-gemma-langchain",
      "metadata": {
        "id": "integrate-gemma-langchain"
      },
      "source": [
        "### Integrate Gemma with LangChain\n",
        "\n",
        "Now, let's set up the Gemma model with LangChain:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "id": "initialize-llm",
      "metadata": {
        "id": "initialize-llm"
      },
      "outputs": [],
      "source": [
        "llm = ChatOllama(\n",
        "    model=\"gemma2:9b-instruct-q6_K\",\n",
        "    temperature=0.8\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "YH_Gw5YJeJ4k",
      "metadata": {
        "id": "YH_Gw5YJeJ4k"
      },
      "source": [
        "Test the LLM by asking a simple question:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "id": "test-llm",
      "metadata": {
        "id": "test-llm"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "The capital of France is **Paris**. 🇫🇷  \n",
            "\n"
          ]
        }
      ],
      "source": [
        "response = llm.invoke(\"What is the capital of France?\")\n",
        "print(response.content)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "setup-elasticsearch",
      "metadata": {
        "id": "setup-elasticsearch"
      },
      "source": [
        "## Setting up Elasticsearch\n",
        "\n",
        "Next, you'll set up a local instance of Elasticsearch to serve as our data source and vector store.\n",
        "\n",
        "**Note:** You're choosing to run Elasticsearch locally instead of using [Elastic Cloud](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud) to keep the tutorial self-contained and to avoid external dependencies. This way, you can run everything on your machine without needing internet access or incurring any cloud costs.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "download-install-elasticsearch",
      "metadata": {
        "id": "download-install-elasticsearch"
      },
      "source": [
        "### Download and Install Elasticsearch\n",
        "\n",
        "First, you need to download and install Elasticsearch.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "id": "remove-elasticsearch",
      "metadata": {
        "id": "remove-elasticsearch"
      },
      "outputs": [],
      "source": [
        "# Removes any previous Elasticsearch installations:\n",
        "!rm -rf elasticsearch*"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "download-elasticsearch",
      "metadata": {
        "id": "download-elasticsearch"
      },
      "source": [
        "Download Elasticsearch Version 8.15.1, extract the archive and set permissions.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "id": "4FMqe8NRfFHp",
      "metadata": {
        "id": "4FMqe8NRfFHp"
      },
      "outputs": [],
      "source": [
        "ESVERSION = \"8.15.1\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "download-elasticsearch-code",
      "metadata": {
        "id": "download-elasticsearch-code"
      },
      "outputs": [],
      "source": [
        "%%bash -s \"$ESVERSION\"\n",
        "export ESVERSION=$1\n",
        "\n",
        "# Download and extract ES\n",
        "wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${ESVERSION}-linux-x86_64.tar.gz\n",
        "wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${ESVERSION}-linux-x86_64.tar.gz.sha512\n",
        "tar -xzf elasticsearch-${ESVERSION}-linux-x86_64.tar.gz\n",
        "\n",
        "# The binary's integrity is verified using SHA-512\n",
        "shasum -a 512 -c elasticsearch-${ESVERSION}-linux-x86_64.tar.gz.sha512\n",
        "\n",
        "# Set up user to run ES daemon and configure cgroups\n",
        "umount /sys/fs/cgroup\n",
        "apt install cgroup-tools\n",
        "sudo chown -R daemon:daemon elasticsearch-${ESVERSION}/"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "configure-elasticsearch",
      "metadata": {
        "id": "configure-elasticsearch"
      },
      "source": [
        "### Configure Elasticsearch\n",
        "\n",
        "For demonstration purposes, let's disable security settings.   \n",
        "**Note**: In a production environment, always enable security features.\n",
        "\n",
        "Open the Elasticsearch configuration file and append the following settings:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "id": "configure-elasticsearch-code",
      "metadata": {
        "id": "configure-elasticsearch-code"
      },
      "outputs": [],
      "source": [
        "with open(f'./elasticsearch-{ESVERSION}/config/elasticsearch.yml', 'a') as f:\n",
        "    f.write(\"xpack.security.enabled: false\\n\")\n",
        "    f.write(\"xpack.security.authc:\\n\")\n",
        "    f.write(\"  anonymous:\\n\")\n",
        "    f.write(\"    username: anonymous_user\\n\")\n",
        "    f.write(\"    roles: superuser\\n\")\n",
        "    f.write(\"    authz_exception: true\\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "OVJJf42gejqx",
      "metadata": {
        "id": "OVJJf42gejqx"
      },
      "source": [
        "If you want to verify that the **elasticsearch.yml** file is written correctly, you can uncomment and run the following code block."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "id": "JY7AmN20ekqN",
      "metadata": {
        "id": "JY7AmN20ekqN"
      },
      "outputs": [],
      "source": [
        "# with open(f'./elasticsearch-{ESVERSION}/config/elasticsearch.yml', 'r') as f:\n",
        "#     print(f.read())"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "run-elasticsearch",
      "metadata": {
        "id": "run-elasticsearch"
      },
      "source": [
        "### Run Elasticsearch\n",
        "\n",
        "Now, let's start Elasticsearch as a daemon process:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "id": "start-elasticsearch-code",
      "metadata": {
        "id": "start-elasticsearch-code"
      },
      "outputs": [],
      "source": [
        "%%bash --bg -s \"$ESVERSION\"\n",
        "\n",
        "export ESVERSION=$1\n",
        "\n",
        "sudo -H -u daemon elasticsearch-${ESVERSION}/bin/elasticsearch"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "wait-elasticsearch",
      "metadata": {
        "id": "wait-elasticsearch"
      },
      "source": [
        "It takes Elasticsearch a while to get running, so be sure to wait a few seconds. You can run a manual 60-second sleep command to ensure Elasticsearch has enough time to start:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "id": "wait-elasticsearch-code",
      "metadata": {
        "id": "wait-elasticsearch-code"
      },
      "outputs": [],
      "source": [
        "time.sleep(60)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "verify-elasticsearch",
      "metadata": {
        "id": "verify-elasticsearch"
      },
      "source": [
        "Once the instance has been started, you can check if Elasticsearch is running by listing the processes. You should see several elasticsearch processes running."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "id": "verify-elasticsearch-code",
      "metadata": {
        "id": "verify-elasticsearch-code"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "root        3166    3164  0 22:36 ?        00:00:00 sudo -H -u daemon elasticsearch-8.15.1/bin/elast\n",
            "daemon      3167    3166  8 22:36 ?        00:00:05 /content/elasticsearch-8.15.1/jdk/bin/java -Xms4\n",
            "daemon      3251    3167 99 22:36 ?        00:00:56 /content/elasticsearch-8.15.1/jdk/bin/java -Des.\n",
            "daemon      3305    3251  0 22:36 ?        00:00:00 /content/elasticsearch-8.15.1/modules/x-pack-ml/\n",
            "root        3552     908  0 22:37 ?        00:00:00 /bin/bash -c ps -ef | grep elastic\n",
            "root        3554    3552  0 22:37 ?        00:00:00 grep elastic\n"
          ]
        }
      ],
      "source": [
        "!ps -ef | grep elastic"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "initialize-elasticsearch",
      "metadata": {
        "id": "initialize-elasticsearch"
      },
      "source": [
        "Verify Elasticsearch is running by making a request to the cluster. Here, you use the default elastic superuser and password password to initialize the cluster so that you can perform anonymous calls moving forward.\n",
        "\n",
        "**WARNING**: Do not pass user passwords like this in real life."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "id": "initialize-elasticsearch-code",
      "metadata": {
        "id": "initialize-elasticsearch-code"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  \"name\" : \"2583c25f4c9e\",\n",
            "  \"cluster_name\" : \"elasticsearch\",\n",
            "  \"cluster_uuid\" : \"cn_3_5SURQOcQB2BVm-37w\",\n",
            "  \"version\" : {\n",
            "    \"number\" : \"8.15.1\",\n",
            "    \"build_flavor\" : \"default\",\n",
            "    \"build_type\" : \"tar\",\n",
            "    \"build_hash\" : \"253e8544a65ad44581194068936f2a5d57c2c051\",\n",
            "    \"build_date\" : \"2024-09-02T22:04:47.310170297Z\",\n",
            "    \"build_snapshot\" : false,\n",
            "    \"lucene_version\" : \"9.11.1\",\n",
            "    \"minimum_wire_compatibility_version\" : \"7.17.0\",\n",
            "    \"minimum_index_compatibility_version\" : \"7.0.0\"\n",
            "  },\n",
            "  \"tagline\" : \"You Know, for Search\"\n",
            "}\n"
          ]
        }
      ],
      "source": [
        "!curl -u elastic:password -H 'Content-Type: application/json' -XGET http://localhost:9200/?pretty=true"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "qa-rag-elasticsearch",
      "metadata": {
        "id": "qa-rag-elasticsearch"
      },
      "source": [
        "## QA with RAG Using Elasticsearch\n",
        "\n",
        "Now, you'll perform question answering using Retrieval-Augmented Generation (RAG) with Elasticsearch by implementing the two stages in a RAG-based architecture:\n",
        "\n",
        "1. **Retrieval**: Retrieves relevant context based on the user's query.\n",
        "2. **Generation**: Uses the LLM to generate answers using the retrieved context."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "index-documents",
      "metadata": {
        "id": "index-documents"
      },
      "source": [
        "### Retrieval\n",
        "\n",
        "In this stage, you will perform the following steps:\n",
        "\n",
        "* **Create a sample dataset**: Use sample Pokémon data.\n",
        "* **Preparing Documents for Indexing**: Split documents into manageable chunks.\n",
        "* **Create Embeddings of the Data**: Convert text data into numerical vectors.\n",
        "* **Store the Embeddings in the Vector Store**: Index the embeddings in Elasticsearch.\n",
        "* **Create a Retriever**: Set up a retriever to fetch relevant documents.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "connect-elasticsearch",
      "metadata": {
        "id": "connect-elasticsearch"
      },
      "source": [
        "Initialize the Elasticsearch client by connecting to the local instance.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "id": "connect-elasticsearch-code",
      "metadata": {
        "id": "connect-elasticsearch-code"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Connected to Elasticsearch\n",
            "{'name': '2583c25f4c9e', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'cn_3_5SURQOcQB2BVm-37w', 'version': {'number': '8.15.1', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': '253e8544a65ad44581194068936f2a5d57c2c051', 'build_date': '2024-09-02T22:04:47.310170297Z', 'build_snapshot': False, 'lucene_version': '9.11.1', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}\n"
          ]
        }
      ],
      "source": [
        "es_url = \"http://localhost:9200\"\n",
        "client = Elasticsearch(hosts=[es_url])\n",
        "\n",
        "# Verify connection\n",
        "if client.ping():\n",
        "    print(\"Connected to Elasticsearch\")\n",
        "else:\n",
        "    print(\"Could not connect to Elasticsearch\")\n",
        "\n",
        "print(client.info())"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "C35LOhwa7cXq",
      "metadata": {
        "id": "C35LOhwa7cXq"
      },
      "source": [
        "#### Create a sample dataset\n",
        "\n",
        "First, create some sample data to index. To do this, let's use descriptions of various Pokémon:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "id": "create-sample-data",
      "metadata": {
        "id": "create-sample-data"
      },
      "outputs": [],
      "source": [
        "data = [\n",
        "    {\n",
        "        \"name\": \"Bulbasaur\",\n",
        "        \"description\": \"Bulbasaur has a strange seed planted on its back at birth. The plant sprouts and grows with Bulbasaur.\",\n",
        "        \"type\": \"Grass/Poison\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Charmander\",\n",
        "        \"description\": \"Charmander obviously prefers hot places. When it rains, steam is said to spout from the tip of Charmander's tail.\",\n",
        "        \"type\": \"Fire\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Squirtle\",\n",
        "        \"description\": \"After birth, Squirtle's back swells and hardens into a shell. Squirtle powerfully sprays foam from its mouth.\",\n",
        "        \"type\": \"Water\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Pikachu\",\n",
        "        \"description\": \"When several Pikachu gather, their electricity could build and cause lightning storms.\",\n",
        "        \"type\": \"Electric\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Jigglypuff\",\n",
        "        \"description\": \"When Jigglypuff sings, it never pauses to breathe. If Jigglypuff is in battle against an opponent that does not easily fall asleep, it cannot breathe, endangering its life.\",\n",
        "        \"type\": \"Normal/Fairy\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Meowth\",\n",
        "        \"description\": \"Meowth adores round objects. It wanders the streets on a nightly basis to look for dropped loose change.\",\n",
        "        \"type\": \"Normal\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Psyduck\",\n",
        "        \"description\": \"While lulling its enemies with its vacant look, this wily Psyduck will use psychokinetic powers.\",\n",
        "        \"type\": \"Water\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Mewtwo\",\n",
        "        \"description\": \"Mewtwo was created by a scientist after years of horrific gene splicing and DNA engineering experiments.\",\n",
        "        \"type\": \"Psychic\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Snorlax\",\n",
        "        \"description\": \"Snorlax's daily routine consists of nothing more than eating and sleeping. It is such a docile Pokémon that children use Snorlax's expansive belly as a place to play.\",\n",
        "        \"type\": \"Normal\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Gengar\",\n",
        "        \"description\": \"Sometimes, on a dark night, your shadow thrown by a streetlight will suddenly and startlingly overtake you. It is actually a Gengar running past you.\",\n",
        "        \"type\": \"Ghost/Poison\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Lapras\",\n",
        "        \"description\": \"People have driven Lapras almost to the point of extinction. In the evenings, Lapras is said to sing plaintively as it seeks what few others of its kind still remain.\",\n",
        "        \"type\": \"Water/Ice\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Dragonite\",\n",
        "        \"description\": \"Dragonite is capable of circling the globe in just sixteen hours. It is a kindhearted Pokémon that leads lost ships in a storm to the safety of land.\",\n",
        "        \"type\": \"Dragon/Flying\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Ditto\",\n",
        "        \"description\": \"Ditto rearranges its cell structure to transform itself into other shapes. However, if Ditto tries to transform itself by relying on its memory, it may get details wrong.\",\n",
        "        \"type\": \"Normal\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Magikarp\",\n",
        "        \"description\": \"Magikarp is a pathetic excuse for a Pokémon that is only capable of flopping and splashing. This behavior prompted scientists to undertake research into Magikarp.\",\n",
        "        \"type\": \"Water\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Charizard\",\n",
        "        \"description\": \"Charizard flies around the sky in search of powerful opponents. Charizard breathes fire of such great heat that it melts anything.\",\n",
        "        \"type\": \"Fire/Flying\"\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Onix\",\n",
        "        \"description\": \"Onix burrows at high speed in search of food. The tunnels Onix leaves are used as homes by Diglett.\",\n",
        "        \"type\": \"Rock/Ground\"\n",
        "    }\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "prepare-index-documents",
      "metadata": {
        "id": "prepare-index-documents"
      },
      "source": [
        "#### Preparing Documents for Indexing\n",
        "\n",
        "Splitting documents into smaller manageable chunks is ideal for efficient Elasticsearch indexing and search, especially for larger descriptions.\n",
        "\n",
        "The following code prepares the documents for efficient processing by splitting them into manageable chunks while preserving metadata. It first extracts the descriptions and metadata, then uses `RecursiveCharacterTextSplitter` to break each description into overlapping segments. The result is a collection of smaller, context-rich text chunks, each paired with relevant metadata, ready for indexing."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "id": "split-documents",
      "metadata": {
        "id": "split-documents"
      },
      "outputs": [],
      "source": [
        "metadata = []\n",
        "content = []\n",
        "\n",
        "for doc in data:\n",
        "    content.append(doc[\"description\"])\n",
        "    metadata.append({\"name\": doc[\"name\"], \"type\": doc[\"type\"]})\n",
        "\n",
        "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
        "    chunk_size=512, chunk_overlap=256\n",
        ")\n",
        "docs = text_splitter.create_documents(content, metadatas=metadata)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "initialize-embeddings",
      "metadata": {
        "id": "initialize-embeddings"
      },
      "source": [
        "#### Create Embeddings of the Data\n",
        "\n",
        "Embeddings are numerical representations (vectors) of text. Text with similar meaning will have similar embedding vectors. You'll use an embedding model to create the embedding vectors of the data.\n",
        "\n",
        "Initialize the embeddings using the `sentence-transformers/all-MiniLM-L12-v2` HuggingFace embedding model. You specify the `device` as `cuda` to utilize the GPU for faster computations and any additional paramaters like `normalize_embeddings` can be customized too.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "initialize-embeddings-code",
      "metadata": {
        "id": "initialize-embeddings-code"
      },
      "outputs": [],
      "source": [
        "embeddings = HuggingFaceEmbeddings(\n",
        "    model_name=\"sentence-transformers/all-MiniLM-L12-v2\",\n",
        "    model_kwargs={'device': 'cuda'},\n",
        "    encode_kwargs={'normalize_embeddings': False}\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "index-documents-code",
      "metadata": {
        "id": "index-documents-code"
      },
      "source": [
        "#### Store the Embeddings in the Vector Store\n",
        "\n",
        "Set up the `ElasticsearchStore` index using the `langchain-elasticsearch` integration. You'll also use the same index while querying the vector store using the RAG later.\n",
        "\n",
        "Run the following cell to start indexing the documents."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "id": "index-documents-code-cell",
      "metadata": {
        "id": "index-documents-code-cell"
      },
      "outputs": [],
      "source": [
        "index_name = \"es-rag-pokemon\"\n",
        "\n",
        "documents = ElasticsearchStore.from_documents(\n",
        "    docs,\n",
        "    embeddings,\n",
        "    index_name=index_name,\n",
        "    es_url=es_url\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "creating-retriever",
      "metadata": {
        "id": "creating-retriever"
      },
      "source": [
        "#### Create a Retriever\n",
        "\n",
        "Next, you'll create a retriever to fetch relevant documents based on user queries. You'll design a hybrid search query for Elasticsearch that combines both traditional keyword-based search and vector similarity search.\n",
        "\n",
        "* [**Keyword Matching using BM25**](https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables): Ensures that documents containing the exact or similar terms (with fuzziness) are retrieved. This is useful for precise term matching and accommodates typos.\n",
        "\n",
        "* [**Vector Similarity (KNN)**](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm): Retrieves documents that are semantically similar to the search query, even if they don't contain the exact search terms. This captures the meaning behind the words.\n",
        "\n",
        "\n",
        "So, let's define a hybrid query function that uses both techniques.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "id": "define-hybrid-query",
      "metadata": {
        "id": "define-hybrid-query"
      },
      "outputs": [],
      "source": [
        "def hybrid_query(search_query: str) -> Dict:\n",
        "    vector = embeddings.embed_query(search_query)\n",
        "    return {\n",
        "        # Keyword matching\n",
        "        \"query\": {\n",
        "            \"match\": {\n",
        "                \"text\": {\n",
        "                    \"query\": search_query,\n",
        "                    # Keyword matching with typo tolerance\n",
        "                    \"fuzziness\": \"AUTO\",\n",
        "                }\n",
        "            },\n",
        "        },\n",
        "        # K-Nearest Neighbors\n",
        "        \"knn\": {\n",
        "            # The default vector field name in LangChain is \"vector\"\n",
        "            \"field\": \"vector\",\n",
        "            \"query_vector\": vector,\n",
        "            \"k\": 5,\n",
        "            \"num_candidates\": 10,\n",
        "        }\n",
        "    }"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "rF35C94sgeBO",
      "metadata": {
        "id": "rF35C94sgeBO"
      },
      "source": [
        "Initialize the retriever."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "id": "initialize-retriever",
      "metadata": {
        "id": "initialize-retriever"
      },
      "outputs": [],
      "source": [
        "retriever = ElasticsearchRetriever.from_es_params(\n",
        "    index_name=index_name,\n",
        "    body_func=hybrid_query,\n",
        "    content_field=\"text\",\n",
        "    url=es_url,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "SDhcyaqEgf-k",
      "metadata": {
        "id": "SDhcyaqEgf-k"
      },
      "source": [
        "Here, you'll test if the retriever is working as intended."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "test-retriever",
      "metadata": {
        "id": "test-retriever"
      },
      "outputs": [],
      "source": [
        "retriever.invoke(\"Pikachu\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "creating-rag-chain",
      "metadata": {
        "id": "creating-rag-chain"
      },
      "source": [
        "### Generation\n",
        "\n",
        "Next, the Generation phase involves prompting the LLM for an answer when the user asks a question. The retriever you created in the previous stage will be used here to provide more context.\n",
        "\n",
        "You'll perform the following steps in this stage:\n",
        "\n",
        "* Load a predefined RAG prompt.\n",
        "* Define a function to format retrieved documents.\n",
        "* Handle Gemma formatting using a custom output parser\n",
        "* Chain everything together"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "-IzsYiwwVK-5",
      "metadata": {
        "id": "-IzsYiwwVK-5"
      },
      "source": [
        "#### Load a predefined prompt\n",
        "\n",
        "You will use a [RAG prompt template](https://www.google.com/url?q=https%3A%2F%2Fsmith.langchain.com%2Fhub%2Frlm%2Frag-prompt) from LangChain Hub for the predefined prompt. It is useful for chat, QA, or other applications that rely on passing context to an LLM.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "id": "load-prompt",
      "metadata": {
        "id": "load-prompt"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/langsmith/client.py:354: LangSmithMissingAPIKeyWarning: API key must be provided when using hosted LangSmith API\n",
            "  warnings.warn(\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Prompt:\n",
            "\n",
            "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
            "Question: {question} \n",
            "Context: {context} \n",
            "Answer:\n"
          ]
        }
      ],
      "source": [
        "prompt = hub.pull(\"rlm/rag-prompt\")\n",
        "print(f\"Prompt:\\n\\n{prompt.messages[0].prompt.template}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6XyiBuYOC_a3",
      "metadata": {
        "id": "6XyiBuYOC_a3"
      },
      "source": [
        "#### The formatted context\n",
        "\n",
        "Define a function to format the retrieved documents."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "id": "format-docs",
      "metadata": {
        "id": "format-docs"
      },
      "outputs": [],
      "source": [
        "def format_docs(docs):\n",
        "    return \"\\n\\n\".join([doc.page_content for doc in docs])"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "Ev-Hnb1bhkRj",
      "metadata": {
        "id": "Ev-Hnb1bhkRj"
      },
      "source": [
        "#### Custom Gemma Output Parser\n",
        "Since Gemma models use specific formatting, you'll create an output parser to extract the instruction-tuned model's response properly."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "id": "gemma-output-parser",
      "metadata": {
        "id": "gemma-output-parser"
      },
      "outputs": [],
      "source": [
        "class GemmaOutputParser(BaseTransformOutputParser[str]):\n",
        "    def parse(self, text: str) -> str:\n",
        "        model_start_token = \"<start_of_turn>model\\n\"\n",
        "        idx = text.rfind(model_start_token)\n",
        "        return text[idx + len(model_start_token):] if idx > -1 else text"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "Ys_tR0JV_f3M",
      "metadata": {
        "id": "Ys_tR0JV_f3M"
      },
      "source": [
        "#### Chain everything together\n",
        "\n",
        "Finally, you'll set up the RAG chain that ties everything together while relying on LangChain's [LCEL paradigm](https://python.langchain.com/v0.1/docs/expression_language/why/).\n",
        "\n",
        "[The prompt](https://smith.langchain.com/hub/rlm/rag-prompt) you're using expects an input that includes two keys: `context` and `question`. The user only provides the question, so you need to obtain and format the relevant context using our retriever. Here's how it works:\n",
        "\n",
        "* **Retrieve Relevant Documents**: Use `retriever` to get the relevant documents based on the user's question.\n",
        "\n",
        "* **Format the Documents**: The retrieved documents may not be in a format suitable for our prompt, so you'll use the `format_docs` function to combine and format these documents into a single coherent string. This formatted string becomes our `context`. The pipe symbol here (`|`) is used to chain `retriever` and `format_docs`, resulting in the **formatted context**.\n",
        "\n",
        "* **Pass the Question**: Pass the **user's question** directly under the `question` key via `RunnablePassthrough`. This behaves almost like the identity function, except that `RunnablePassthrough` allows you to pass the user's question directly through the chain to the prompt and the model without any modification (or add more keys to the output via `RunnablePassthrough.assign`). To learn more about this, read [the official documentation](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html).\n",
        "\n",
        "* **Fill the Prompt**: Combine both the **formatted context** and the **user's question** into a single input dictionary for the prompt. This dictionary fills in the placeholders (like `context` and `question`) with the actual values from the dictionary. This results in a complete prompt text that is ready to be sent to the language model.\n",
        "\n",
        "* **Generate Answer with the LLM**: The filled prompt is then passed to the `llm` (the Gemma model via Ollama). The LLM processes the prompt and generates a response based on the provided context and question.\n",
        "\n",
        "* **Parse the LLM's Output**: The raw output from the LLM might include control tokens or additional formatting specific to the Gemma model. You'll use the `GemmaOutputParser` to parse the LLM's output and extract the final answer.\n",
        "\n",
        "\n",
        "In simple terms, you gather the context related to the question, format it appropriately, and then provide both the formatted context and the question to the model to generate an informed answer.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "id": "assemble-rag-chain",
      "metadata": {
        "id": "assemble-rag-chain"
      },
      "outputs": [],
      "source": [
        "# Create an actual chain\n",
        "\n",
        "rag_chain = (\n",
        "    # First you need retrieve documents that are relevant to the\n",
        "    # given query\n",
        "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
        "    # The `context` and `question` are then passed to the prompt\n",
        "    | prompt\n",
        "    # The whole prompt will all the information is passed the LLM\n",
        "    | llm\n",
        "    # The answer of the LLM is parsed by the class defined above\n",
        "    | GemmaOutputParser()\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "try-it-out",
      "metadata": {
        "id": "try-it-out"
      },
      "source": [
        "### Try It Out!\n",
        "\n",
        "Finally, let's ask some questions and see how the RAG chain performs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "id": "ask-question-1",
      "metadata": {
        "id": "ask-question-1"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "When several Pikachu gather, their electricity could build and cause lightning storms.  The strength of this electrical buildup increases with the number of Pikachu present. \n",
            "\n",
            "\n",
            "\n"
          ]
        }
      ],
      "source": [
        "question = \"What happens when several Pikachu gather?\"\n",
        "answer = rag_chain.invoke(question)\n",
        "print(answer)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "id": "ask-question-2",
      "metadata": {
        "id": "ask-question-2"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Dragonite and Charizard can fly.  They are both powerful Pokémon known for their flying abilities. \n",
            "\n"
          ]
        }
      ],
      "source": [
        "question = \"Name a few Pokémon that can fly\"\n",
        "answer = rag_chain.invoke(question)\n",
        "print(answer)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "id": "ask-question-3",
      "metadata": {
        "id": "ask-question-3"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Meowth loves round objects.  It searches for dropped coins at night. \n",
            "\n",
            "\n",
            "\n"
          ]
        }
      ],
      "source": [
        "question = \"What's a Pokémon that loves round objects?\"\n",
        "answer = rag_chain.invoke(question)\n",
        "print(answer)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "conclusion",
      "metadata": {
        "id": "conclusion"
      },
      "source": [
        "Congratulations! You've successfully built a local Retrieval-Augmented Generation (RAG) application using the quantized Gemma 2 model (Gemma 2 9B IT), LangChain, Ollama, and Elasticsearch.\n",
        "\n",
        "Feel free to explore further by:\n",
        "\n",
        "- Adding more data to Elasticsearch.\n",
        "- Experimenting with different Gemma models from the [Ollama Gemma 2 Model Catalog](https://ollama.com/library/gemma2).\n",
        "- Tweaking model parameters for better performance.\n",
        "- Check [Elastic Cloud deployment](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud) out and learn how to obtain your Elastic credentials ([ELASTIC_CLOUD_ID](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id) and [ELASTIC_API_KEY](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key)) for setting up a cloud instance instead.\n",
        "\n",
        "By following this tutorial, you're now equipped to build your own local RAG applications and explore the capabilities of Gemma models combined with LangChain and Elasticsearch.\n"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "[Gemma_2]Using_with_Elasticsearch_and_LangChain.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
